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In this paper, we study the problem of pointwise estimation of a multivariate function. We 
develop a general pointwise estimation procedure that is based on selection of estimators from 
a large parameterized collection. An upper bound on the pointwise risk is established and it is 
shown that the proposed selection procedure specialized for different collections of estimators 
leads to minimax and adaptive minimax estimators in various settings. 
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1. Introduction 

In this paper, we study the problem of pointwise nonparametric estimation of an unknown 
function F : M. d — > R in the multidimensional Gaussian white noise model 

Y(dt) = F(t)dt + sW(dt), t=(h,...,t d )£T>, (1) 

where T> is an open interval in R d containing Vq := [—1/2, l/2] d , W is the standard 
Wiener process in M. d and < e < 1 is the noise level. Our goal is to estimate F at 
a given point x € 2?o using the observation y e := {Y(t),t £ V}. We assume that the 
observation set T> is larger than T>q in order to avoid boundary effects. Such assumptions 
are rather common in multivariate nonparametric models (see, e.g., Chen (1991), Hall 
(1989)). 

Accuracy of an estimator F(x) = F{x;y e ) is measured by the risk 

TZ r [F; F] :={E F \F(x)-F(x)\ r } 1 / r , r > 0, 

where M F denotes the expectation with respect to the distribution F F of satisfying 
(I)- 



This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 
2008, Vol. 14, No. 4, 1150-1190. This reprint differs from the original in pagination and 
typographic detail. 



1350-7265 © 2008 ISI/BS 



Universal pointwise selection rule in multivariate function estimation 



1151 



We develop a pointwise estimation procedure that is based on the selection of estima- 
tors from a large collection. 

Denote by 8. the set of all kernels, that is, functions K : T> x T>o — > R such that J v K(t, 
x) dt = 1 for all x £ T>$. Let JC be a given subset of ^ and let ^(/C) be the corresponding 
collection of linear estimators of F{x) associated with the family JC: 

F{K) := ^F K (x) = J K(t,x)Y(dt),K£K,y (2) 

In this paper, we propose an estimator of F(x) that is based on random (measurable with 
respect to y e ) selection from the collection F(JC). Denoting this estimator by Fjc{x), we 
have 

where K £ JC for any "frozen" trajectory y e . Although F/c(x) can be constructed for 
any JC £ K, we establish the upper bound on its risk only for JC £ V(&); here, V(&) is 
the set of all collections JC satisfying some natural and non-restrictive conditions (see 
(K0)-(K2) in Section 2). We then prove (Theorem 1) that for all e small enough and for 
any JC£T(&), 

H r [F K -F]<U KtF {x) VFeF(/C), (3) 

where the upper bound IA]c,f{x) is completely determined by the function F and by 
the family of kernels JC. Here, F(/C) is a large nonparametric set whose dependence on 
JC is typically weak. In particular, in most interesting examples, we have F(/C) D Cf,(2?) 
(see Remark 6 and Theorem 1 below), where C&(X>) is the set of all uniformly bounded 
continuous functions. 

It is important to emphasize that our selection procedure can be applied to different 
collections of kernel estimators. Thus, we derive estimators {Fjc,JC £ V(&)} with different 
statistical properties as an output of a unique computational routine. 

Kernel collections. Consider several examples of kernel collections for which the upper 
bound (3) can be established. Here and later on, K : M. d — ► R is a fixed function and for 
all u,v £ K d , we understand u/v as (ui/vi, . . . ,Ud/vd)- 

Example 1 . Let d = 1 and for any x £T>q, let 



ICi = (h l K^—^j,h£[h min ,h iaax ]^, 



where < h m - ln < h max < 1 are given real numbers. 

A random choice from this collection leading to a data-driven bandwidth h e (x) = 
h(x,y s ) was proposed in Lepski et al. (1997). The upper bound of type (3) obtained 
in that paper was used in order to establish minimax results on the Besov classes of 
functions. Wc note that the estimator F K ,K(t, x) = h e (x)~ 1 K([t — x]/h e (x)) constructed 
in Lepski et al. (1997) and the estimator F^ developed in this paper are different. 
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Example 2. Consider generalization of the above collection TC\ to an arbitrary dimen- 
sion d > 1. Let H = <g>ti I^mL ^L] and 



K H = 



IK 

1=1 



(4) 



A sophisticated random choice from the collection JCt-l was proposed in Kerkyacharian et al. 
(2001). The corresponding upper bound of the type (3) allowed minimax results to be 
obtained on the anisotropic Besov classes of functions (functions with inhomogcncous 
smoothness). Again, we note that the estimator constructed in Kerkyacharian et al. 
(2001) and the estimator Fjc n proposed in the present paper are different. 



Even though ICh is a rather rich collection, it is not "sufficiently" rich for many inter- 
esting statistical problems. The next example illustrates this point. 

Example 3. Denote by £ the set of all d x d orthogonal matrices and let 

Til = {h € M. d : h = (hi, /imax, ■ ■ ■ , Vai), hi G [/imin, Vai]}- 

Consider the kernel collection 

This collection is appropriate for the estimation of functions possessing the single index 
structure. We refer to Chen (1991), Golubev (1992), Hristache et al. (2001) and references 
therein for works on estimation in the single index model. 



Note that the collections (4) and (5) are quite different and "incomparable" . However, 
one can easily define a more general collection of kernels that combines (4) and (5). 



Example 4- Define 




The estimator Fk h £ could be applied simultaneously to estimate functions with inho- 
mogeneous or unknown smoothness as well as functions with the single index structure. 



The list of examples of kernel collections corresponding to different "structural" models 
(see Stone (1985)) could be continued. Selection from such collections leads to estimators 
that adapt simultaneously to a wide spectrum of assumptions on smoothness, struc- 
ture, etc. Pointwise adaptive estimators based on selection from specific collections of 
estimators were also constructed in Lepski (1990, 1991), Lepski and Spokoiny (1997), 
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Goldenshluger and Nemirovski (1997), Tsybakov (1998), Klemela, and Tsybakov (2001) 
and Golubev (2004). A detailed discussion of relationships between our results and results 
in the cited papers is given in Section 3.3. 

Objective of the paper. The local inequality (3) specialized to different families of kernels 
K, G V{^) allows us to derive minimax and adaptive results in various settings. This is 
the feature that characterizes the power of the estimator F/c and usefulness of the upper 
bound in (3). In order to demonstrate universality of our selection procedure, we discuss 
its application to the following nonparamctric estimation problems. 

(i) Pointwise adaptive estimation in the single index model. Here, we assume that 
F(t) = f(uj T t), where / : R — > M is an unknown function, ui G § d_1 is an unknown 
direction vector and § d_1 is the unit sphere in M. d . Suppose, also, that / belongs 
to the one-dimensional Holder ball with unknown parameters. The objective is to 
estimate F at a single given point x G T>q. 

(ii) Pointwise minimax estimation over a union of anisotropic Holder classes. In this 
setting, it is assumed that F belongs to the union of anisotropic Holder classes 
Md(ot,L) (see Definition 2) over all a = (a\, . . . ,ay) satisfying J2i=i V a i = V7j 
where 7 > is a given number. The objective is to estimate F(x) at a given point 
xeV . 

(iii) Global minimax estimation over isotropic Besov classes. Assume that F belongs 
to the isotropic Besov class. The objective is to estimate F globally on T>q with 
small L r -risk, TZ^ r [F; F], r G [l,oo). 

We are not aware of any results on problem (i) reported in the literature. For this 
problem, our procedure provides a minimax adaptive estimator in the sense of (6) with F* 
being the one-dimensional Holder class Hi (a, L) and the parameter s = (a, L) including 
smoothness index a and constant L. Thus, in the setup of problem (i), there is no price 
to pay for adaptation to the unknown smoothness parameters a and L. 

Problems (ii) and (iii) were considered in Klutchnikoff (2005) and Kerkyacharian et al. 
(2001), respectively. We note, however, that the methods proposed in these papers are 
highly specialized and are tailored to the problem in question. In contrast to this, we 
arrive at the solution to these problems by applying the same general selection procedure 
for different collections of estimators. In particular, our selection procedure applied to 
the collection J-{K,si) provides a solution to problem (i). The minimax estimators for 
problems (ii) and (iii) are constructed by using the proposed scheme on certain subcol- 
lections of T{lCu,e)- Moreover, we show that all of the problems (i)-(iii) can be solved 
simultaneously by the same selection procedure applied to the collection of estimators 
T{K n ,s). 

Derivation of minimax and adaptive results. Let us briefly discuss how to derive mini- 
max and adaptive results from local inequalities of type (3). 

In the framework of the minimax approach, F is assumed to belong to some given 
set F*. The objective is to find an estimator F such that 



sup TZ r [F;F}~ inf sup H r [F; F] as e -> 0, 

Few f Few* 
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where inf is taken over all possible estimators. Here and in what follows, oxt means 
that < Ci < a/b < ci < oo for some constants c\ and ci. If, for a fixed family of kernels 
K, e V{R), it is shown that F* C F(/C) and 

sup Ujc.f{x) x inf sup 72. r [F;F] as e — > 0, 

FeF* F FeF* 

then the estimator F/c is minimax on F* . 

The minimax global results can be also derived from local inequalities of type (3). 
Indeed, suppose that we are interested in estimating F with small L r -risk 

K u [F;F}:={E F \\F-Ff r } 1 / r , r > 0, 

where || ■ || r is the standard L,,-norm on Vq. Then, by the use of Fubini's theorem, wc 
obtain from (3) that 

n hr [F K] F]<\\U K , F (-)\\ r . 
If, for a fixed family of kernels K, € 'P(.ft), one can prove that F* C F(/C) and 

sup ||Wx;,F( - )llr x m ^ SU P ^L r [-^"i F] as £^0, 

FeF* F FeF* 

then the corresponding estimator F^ is minimax on F* with respect to L r -risk. Local 
inequalities (3) are powerful tools for derivation of global minimax results in problems 
of estimating functions with inhomogencous structure. 

Local and global minimax adaptive results are obtained in a similar way. In the frame- 
work of the minimax adaptive approach, F is assumed to belong to UseS^s' wnere 
{F*, s G S} is a given collection of sets. The objective is to find an estimator F such that 
for every seS, 

sup TZ r [F; F] x inf sup TZ r [F; F] as e 0. (6) 

fgf; f fgf; 

If, for some K, £ 'P(^), one can show that F* C F(/C) for all s £ S and 

sup Ujc.f{x) x inf sup lZ r [F; F] as e — > 0, 

FeF; F FeF; 

then the estimator Fjc is minimax adaptive for the collection {F*,s G S}. Moreover, if 
F*cF(/C),V.se5and 

sup ||Wx:,F( - )llr x i 11 ^ sup TZ^lF; F] ase^O, 
FeF; f FeF; 

then Ffc is minimax adaptive for {F* ,se5} with respect to the L r -risk. 

The rest of the paper is organized in the following way. In Section 2, we introduce 
notation and assumptions that are used throughout the paper and prove some prepara- 
tory results. In Section 3, we present our selection procedure, discuss its connections to 
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other procedures and state the main result of this paper (Theorem 1). In Section 4, we 
apply the developed selection procedure to the aforementioned nonparametric estimation 
problems (i)-(iii). Section 5 contains the proof of Theorem 1. In Section 6, we prove all 
the results appearing in Section 4. Auxiliary results and proofs are collected in Appendix. 

2. Preliminaries 

We will use the following notation: || • || p denotes the L p (2? )- norm j while || ■ \\ p .oc denotes 
the L Pj00 (R d x T> ) -norm, 

\\G\\ p>ao = sup ( ( \G(t,x)\ p dt) '\ pe[l,oo]. 
xev \ir / 

We also write | • I2 for the Euclidean norm. 

Basic families of kernels. Let C R m be a compact set and consider a parameterized 
family of kernels JCq = {K M , /ie9}, where : R d x R d ->• R. Throughout the paper, we 
consider families of kernels ICq satisfying the following conditions. 

(KO) Let T>i be an open interval in R d such that T>o C T>\ C T>. For all (ig 6, one 
has 

supp(/^(-,y))C2? 1 VyeP , 

(7) 

K„(t,y)dt = l VyGCi. 
Moreover, 

cr(/Ce) := sup ll^lla.oo < °°, (8) 
pee 

M(K&):= sup H^lli.oo < 00. (9) 
pee 

Note that (7) implies that M(K@) > 1. Conditions (7)-(9) are standard in the context 
of kernel estimation. 

In the construction of our selection rule, we use the auxiliary kernel collection /Cexe = 
{K^, n, v e 9}, K M , V : R d x R d -> R, defined as 

K^(t,x):= K fl {t,y)K u (y,x)dy, teV,xeV . 

JT>! 

In what follows, we will assume that the following "commutativity property" is fulfilled 
for the kernels from ICq: 
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(Kl) 

Kw = K v # Vfti/ee. (10) 

Remark 1. Assumption (Kl) is crucial for the construction of our selection procedure. 
Although this is a restriction on the family ICq, (10) is trivially fulfilled for kernels 
K^it, x) = K^it — x) that correspond to standard kernel estimators. 

The next statement establishes an important property of the kernel v G 0. 

With any function F, we associate the quantities 

B lhV {x)= ( K^ v (t,x)F(t)dt-F(x), (11) 
B v {x)= / K v (t,x)F(t)dt- F(x), xeV . (12) 



p 



Proposition 1. Let (7) hold. Then for any x G T>o and F G Cb(D), one has 

B^ix) - B v {x) = [ K l/ (y,x)B fJl (y)dy. (13) 



p 

The proof of the proposition is given in the Appendix. 

Remark 2. Note that is a kernel for all /i,^G 9, that is, J T) K lltV (t,x)dt = 1, 

Vx G T>q. This fact follows immediately from (13) if wc put F=l. 

Auxiliary estimators and selection statistics. With the collections ICq and /Cq x q, wc 
associate the following families of linear estimators via (2): 

F(lC @ ) = {F fl = F K », l ieQ}; 
F()C ex e) = {F^ = F K »"<, l i,veQ}. 

It is easily seen that 

F M (a?) - F(x) = B M (x) + e^{x), 
Fn,v(x) - F(x) = B^ v (x) + e£ m ,„(:c), 

where 

Ux) = / K M (t,x)W(dt), t M , v (x) = ( K^(t,x)W(dt). 



Thus, the quantities B^{x) and B^^(x) defined in (11)— (12) represent the bias of F^ix) 
and F ll ^(x), respectively. In addition, we denote o^(x) = var{F M (a;)} = \\K^(- 7 x)\\2. 
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Our selection procedure will be based on the statistics {F M ,„(x) — F u (x) , /x, v € 0}. It 
is clear that 

F^{x) - F v (x) = B„, v (x) - B u {x) + e[^ v {x) - &(x)], (14) 
where ^^(x) — £i/(x) is a Gaussian zero-mean random variable with variance 

ffjUfc) —v&riF^^-Fvix)} = \\K^(-,x)-K v (-,x)\\l 
Also, note that F^. v (x) = -F„ l(U (x), in view of (Kl) . 

Integrated bias and variance. With any estimator F^, fi S O, we associate the following 
two quantities: 

B^x) := sup |fl M , v (z) - B v (x)\ V (15) 



o>(x):=sup / |iC(2/,x)|o- A1 (2/)dyV(T Al (x). (16) 

In words, -B M is the maximum among the maximal integrated (with kernels K v ) bias 
and the bias of F^, while is the maximum among the maximal integrated standard 
deviation of F^ and standard deviation of F^. In what follows, with slight abuse of 
terminology, we will refer to -B M (x) and i?^(x) as the integrated bias of F^ and the 
integrated variance of respectively. 
It follows from (13) and (9) that 

B^x) <M(/C e ) sup |, or„(z) < M(K&) sup a M (j/). (17) 

y y 

We also have the following upper bound on cr^^x) in terms of ^(x) and a v {x): for all 
/i, is E<d, 

ct m ,„(x) < || a?)|j 2 + \\K y (-,x)\\ 2 

(18) 

< Cr M (x) + <7„(x) < (T M (x) + C7z,(x). 

Here, we have used the triangle inequality and the Minkowski inequality for integrals. 

In what follows, point x is fixed. So, in our notation, we will not indicate dependence 
on x when this does not lead to confusion. 



3. Selection procedure and main result 



In this section, we introduce our selection rule. 
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3.1. Major ant 

We begin with the definition of the majorant, the main ingredient of our construction. 

Let Se := {<5> : /i £ 0} C K+ and define a mm := inf Se, cr max := supSe- Thus, £e is 
the image of O under the mapping \i \— * <7 M , where is defined in (16). Let 

e K; e ( cr ) : = supE sup \£n lV -£v\, crGSe- (19) 



Remark 3. By definition, the function e^ e (-) is non-decreasing on Se- For any given 
a £ Se, e^ e (cr) is the maximal (over /j€8) expectation of supremum of the Gaussian 
zero-mean random process {£,^. v — ^} with the index set {v : a v < a} C 0. The covariancc 
structure of this process is completely determined by the family of kernels /Ce- Thus, the 
function e/c e (') can be computed, for example, using Monte Carlo simulations. Alterna- 
tively, useful analytical bounds on ejc e (0 can be derived from the theory of Gaussian 
processes. 

(E) Let e(a) be a continuous non- decreasing function on Se such that 

(i) e(cr) > e Ke (cr), Vcr<E£ , 

(ii) there exist absolute constants 1 < c e < C e such that 

Ce< e -^-<C e VaeS e . (20) 
e(a) 

Remark 4- The function e(-) is an upper bound on ejc e (-)- Such a bound can be derived 
from general inequalities on suprema of Gaussian processes. Condition (20) holds, for 
example, if e(o~) = ccrL(o~), where c is a constant and L(o~) is a slowly varying function. 
In fact, for our purposes, it is sufficient to require that inequalities in (20) hold for the 
ratio e(aa)/e(a) for some a > 1. 

We are now in a position to define the majorant: 



Q(<7):=x e(a-)+(7Wl + xiln , er£S e , (21) 

V fmin 

where x = 2C e and Xi = 128r(l V lnC e /ln2). 

Remark 5. Loosely speaking, the majorant uniformly bounds from above the random 
process £ Mi „ — f € 0, with prescribed probability. The function Q consists of two 
terms. The first term bounds the expectation of the supremum of a zero-mean Gaussian 
random process, while the second term controls the deviation of this supremum from its 
expectation. In fact, the first term characterizes "massiveness" of the subset of estimators 
from T(JCq) with variance less than a prescribed level. The second term involves a 
logarithm of the ratio of estimator variances in the family. It can be regarded as a price 
to be paid for considering families of estimators with different variances. 
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We are now in a position to define our selection rule. 
For any /i£0, let 



R„:= sup {\F^-F v \-^Q(a u )}. (22) 

Let S = jsQ(a m i n ) and let ft&Q be such that 

h + eQ(vfi) < inf + eQ(er M )} + 5. (23) 

MS© 

We then define 

F = Ffr. (24) 

Several remarks on the above definition are in order. First, observe that may be 
negative; however, by definition, 

Rp>-\eQ{pv) V^eG, (25) 

so that i? M + eQ(cr /J ) is always positive. Second, in order to ensure that there exists a 
measurable choice of ft satisfying (23), one needs to impose additional conditions on the 
family of kernels JCq. The next assumption provides such conditions. 

(K2) There exist positive constants L and 7 G (0, 1] such that 

sup "Y^^ i, (26) 

SUp j -p= ^ A,, (ZCj 

IM-MI2 

w/iere = if /i (-,x)/||A^(-,x)|| 2 , V/i G 6. 

In the proof of Theorem 1, we show that (K0)-(K2), and boundcdness and continuity of 
F imply that there exists a measurable choice of ft £ such that (23) holds. Thus, our 
selection rule is well defined. 



3.3. Discussion 

In this section, we explain the main idea underlying the construction of our selection 
scheme and discuss connections to other procedures in the literature. 

The pointwise selection procedures were developed by Lcpski (1990, 1991), Lcpski et al. 
(1997), Lepski and Spokoiny (1997) and Kcrkyacharian et al. (2001). In those papers, the 
procedures are two-staged: first, a collection of admissible estimators is constructed using 
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a "bias-variance" comparison scheme; second, among admissible estimators, an estimator 
with minimal variance is selected. The procedure in Lepski (1990, 1991) (and its refine- 
ments in Lepski et al. (1997), Lepski and Spokoiny (1997)) selects from the collection 
J-(JCi) of one-dimensional kernel estimators (see Example 1 in Section 1) discretizcd in 
an appropriate way. In our notation, it reads as follows: 

select the estimator with maximal bandwidth /i € [h m i n , h mafX ]# such that 

\F„ -F v \< T(ji, v) ~iv G [Vin, h max \* : o v > a M , (28) 
where stands for a discretization of a set A and T(/z, v) is a certain threshold. 

Here, the set of admissible estimators contains all estimators F^, fi£ [h max , h m - m }# sat- 
isfying (28) and at the selection stage, the estimator with minimal variance (maximal 
bandwidth) is chosen. This scheme exploits monotonicity properties of the bias and vari- 
ance with respect to the bandwidth which, in general, do not hold in the multidimensional 
case. 

A generalization of (28) to the multidimensional case was developed in Kerkyacharian et al. 
(2001). Their procedure is designed for selection from the properly discretized collection 
T(K,n) (see Example 2, Section 1) and can be represented as follows: 

For [i = (/ii, . . . , iid) £ Ti.^ and v = {y\, . . . , vj) € T~C^, define [i\lv = (/ii Wi, . . . , fid V 
Vd) and consider the auxiliary estimator F^ v = F^y u . The estimator F^, \i € 
is called admissible if 

\F„, v -Fv\<T(v) Vi/GH # :o-„>o- M> (29) 

where T(y) is an appropriate threshold. Note that (29) can be rewritten as 

sup [|^-#„|-T(i/)]<0. (30) 

At the selection stage, we choose the admissible estimator with minimal variance. 

Note that the scheme (29) involves an auxiliary estimator and its construction can 
only be used for selection from the collection T(JCu)- Specifically, the procedure (29) 
cannot be applied for selection from the collection of kernel estimators T(JCsi) (see 
Example 3, Section 1) corresponding to the single index model. 

Our selection procedure (22)-(24) also uses an auxiliary estimator F^ tV , but, in contrast 
to (29), the construction of F^ is universal and fits a wide variety of kernel collections. 
In addition, instead of pairwise comparisons with a threshold (as in (28) and (29)), we 
define the majorant function and use direct minimization. Our rule (23) is very much 
in the spirit of (30). Indeed, the procedure of Kerkyacharian et al. (2001) minimizes 
subject to constraint (30), while in (23), we minimize, with respect to fi, the expression 
sup^^ [l-F^ - F M | - \T(v)\ + T{p) and T(fjt) is "roughly" proportional to o^. 

Summing up, the proposed selection method differs from other pointwise selection 
procedures in: (a) construction of the auxiliary estimators F^y^ (b) selection by direct 
minimization. These features enable a wide variety of kernel collections to be treated in 
a unified way and the discretization of the parameter space O to be avoided. 
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In order to present an upper bound on the risk of the proposed estimator, we need the 
following definition. 

For any function F E Cb(T>) and given collection K,q, define 

Q f (K.q) := {/i e 9 : Va > <r M , a e £e 30 eQ such that dg = a and Bg < \eQ(dg)}. 

In what follows, we will consider functions F for which 0f(/Ce) is non-empty. This 
condition is closely related to the existence of estimators in JF(/Ce) realizing the bias- 
variance trade-off. 

Remark 6. Clearly, Qf(JCq) is non-empty for any constant function F since, by (13) 
and (15), Bg = for all 8 £ Q. For the same reason, if Kg is orthogonal to all polynomials 
of degree < I, then Qf(ICq) is non-empty for any F which is a polynomial of degree < I. 
In general, the size of the set of functions F for which 0^(/Ce) is non-empty is completely 
determined by the family JCq. For example, if ^(ICq) is the family of standard kernel 
estimators with a bounded kernel Kg and bandwidth 9 = (hi, . . . , hj) E [e 2 , 1/2]^, then 
9i?(/Ce) is non-empty for any F € Cf,(I?). 

Finally, we put 

/i*=arg inf av (31) 

Theorem 1. Suppose that assumptions (K0)-(K2) and (E) hold. Then, for any F £ 
Cb(T>) such that Qf(JCq) ^ 0, and for all e small enough, one has 

U r [F;F}<CsQ(a^), 

where C is a numerical constant depending only on r, c e and C e . 

4. Applications 

In this section, we show how the upper bound of Theorem 1 can be used for the derivation 
of minimax and adaptive minimax results. In particular, in Sections 4.2-4.4, we consider 
three particular problems: 

• pointwise adaptive estimation in the single index model; 

• pointwise minimax estimation over a union of anisotropic Holder classes; 

• global minimax estimation over isotropic Besov classes. 

Our goal here is to show how a careful choice of the family of kernels leads to estimators 
with optimal statistical properties. Note that in each particular case, the estimators are 
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different, although all of them are obtained by the same computational routine presented 
in Section 3. 

In Section 4.5, we demonstrate that the choice of a rather huge kernel collection allows 
a single estimator to be constructed which is simultaneously optimal (up to a log-factor) 
for these three entirely different problems. 



4.1. General kernel collection 

Let G : K d — > R be a function supported on [—1/2, 1/2] d and satisfying the conditions 

( G(t)&t=l, Jt r G{t)dt = V|t-| = 1,...,Z, sup|VG(t)| 2 <M, (32) 

where r = (n, . . . ,r d ), n>0, \r\ = rH h r d and t r = t[ 1 ■ ■■t r d d . 

Let £ denote the set of d x d orthogonal matrices and let H. = [h m i n ,h max ] d , where 
< h m i n < ft. m ax < 1/2 are given real numbers. 

Define, for all h G H and all E G £, 

G (^'""fe)' Gh, E (t^)=G h (E T [t-x}) (33) 

and consider the following collection of kernels: 

Kn,s = {G h , E ,(h,E)eHx£}. (34) 



G h (t) 



Remark 7. 

1. For the family K,-h,£, we have 

d 

<Jh,E(x)=a h , E = \\Ghl[h- 1/2 Va-GPo, VE G E 
i=i 

and, therefore, ah.Eix) = \\G\\i(Th,E, G T>q, 

0"min = |Gj|l||G||2 KnJxi "'max = 1 1 G\ \ 1 1 1 G| | 2 . 

2. Assumptions (K0)~(K2) are fulfilled for the family K,u,£- Indeed, (KO) holds triv- 
ially; here, M{Ku,e) = \\G\\\. Assumption (K2) is fulfilled because K-n,s consists of 
convolution kernels. Boundedness of the gradient of G in (32), along with (KO) and 
(Kl), implies (K2) (see Lemmas 1 and 2 in Section 6.1). 

In order to construct estimators in the aforementioned problems, we will consider 
families corresponding to different subsets of JCu,£- The family of estimators J-(ICu,£) 
will be considered in Section 4.5. 
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4.2. Pointwise adaptive estimation in the single index model 

Consider the model (1) with F(t) = f(cu T t), where /:R — *■ M is an unknown function 
from the Holder ball Hi (a, L) with unknown parameters a > and L > 0, and co G S d_1 
is an unknown direction vector. We refer to this model as the single index model. 

Definition 1. We say that function F belongs to the functional class Fs/(a, L) if there 
exists a direction vector u) E parameters a > 0, L > and a univariate function 

f € H x (a, L) such that F{t) = f{uJ T t) . 

Define Tii ={heH:h~ (hi,h max , . . . , /i max )}, ©sj = Hi x £ and consider the follow- 
ing subset of 1C-h,£'- 

!Csi = {G h , E :{h,E)€Qsi}- (35) 
The corresponding family of estimators is given by 



HKsi) = F h , E (x) = / G hiE {t,x)Y{dt),(h,E) e 6 



si 



Remark 8. In view of Remark 7, we have 

< B = ||q|i^, B = ||G||i||G|| 2 / l -^- 1)/2 [l/v / ^]; (36) 
^mi„ = ^ 2 ||G||il|G|| 2 , a max = ft-^- 1 )/ 2 ||G||i||G|| 2 [l/vQ- (37) 
Note that Oh,E does not depend on E. 

Let e(a) = CoaVhia, where Co is a numerical constant depending only on d and G. 
It is shown in Lemma 3 of Section 6 that e(cr) > e/c SI (&) for all a G Se SI . The majorant 
Q is given by 

Q(a) =a[K C V\na + y/l + *ci In (o-/cr min )}. (38) 
Note that assumption (E) is trivially fulfilled with c e = 2 and 

C e = 2(1 + 0n2/lna min ) < 2(1 + ^2/d). 

Let Fsi be the estimator derived from the collection T(JCsi), in accordance with 
our general selection rule, with the majorant (38), where kq = 4(1 + y / ln2/lna m i n ) and 
?c\ = 320r. 

Theorem 2. Fix some < a max < oo, let h m i n = e 2 , /i max = £ 2 /( 2Q n» ax + 1 ) and assume 
that (32) holds with I > [ctmaxj ■ 
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Then, for any < a < a max < l> L > and e small enough, one has 

2a/(2a+l) 



n r [F s r 1 F}<CL 1 ^ a+1 UeJ\n-) , (39) 



sup 

Fe¥ S i(a,L) 

where C depends only on a, G, d and r. 

Remark 9. If the parameters a and L of the class Hi (a, L) are known and the direction 
vector lo is unknown, then we consider the following subset of JCsi- 

K' SI = {G h , E ,h = h*,E&£}, 



where h* = (h* u h max , . . .,h max ), h$ = L- 2 ^ 2a+ ^[e^a/e\ 2 ^ 2a+1 \ Under these circum- 
stances, 

^•,B = ^- 1)/a (^)- 1/2 ||G|| a 
does not depend on E (see also Remark 8) and therefore 

CTmin = <W = <?* : = ||G|| i<7 h . ,£ = h^t 1)/2 {h\ ) ~ ^ 2 \ \ G\ \ 2 ||G||i. 



The corresponding majorant is given by Q(o~*) = xq Cq a* Vina* + a* so that the first 
term is dominating (all estimators in !F(K' SI ) have the same variance). The resulting 
selected estimator for this family will then satisfy the same upper bound of Theorem 2. 
One can prove a lower bound that shows that even if a and L are known, the rate of 
convergence on the right-hand side of (39) cannot be improved. 



4.3. Pointwise minimax estimation over a union of anisotropic 
Holder classes 

We start with the definition of the anisotropic Holder class of functions. 

Definition 2. Let a = (ai, . . . , ay), on > and L > 0. We say that f : [—1/2, l/2] d — > 
K belongs to the anisotropic Holder class Hd(a,L) if for all i = 1, . . . ,d and all t S 
hl/2,l/2] d , 

\DTf(t)\<L Vm = l,...,Lad 

and 

\D\ a ^ f(h, . . . , U + z, . . . , U) ~ D\ a ' 1 f(h, ...,ti,...,td)\< L\z\ ai - laii Vz G M, 

where D™ f denotes the mth order partial derivative of f with respect to the variable ti 
and is the largest integer strictly less than cti . 
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Fix 7 > and introduce the functional class 

¥ AH (-f,L)= (J M d (a,L), where A 1 = < a : ^ l/osj = I/7, ccj > 0, i = 1, d 



a£A~, (. i=l 



Remark 10. It is well known (see, e.g., Kerkyacharian et al. (2001) and Bertin (2004)) 
that for any a € Ay, the minimax rate of convergence on Md(a, L) is given by e 2j ^ ( 2 t +1 ) . 
Thus, IF ah" (7) L) is the union of functional classes with prescribed accuracy of estimation. 
Klutchnikoff (2005) showed that the rate e 2 ~</( 2 ~<+ 1 ) is not achievable on ¥ah{i,L) and 
proved that the minimax rate of convergence on ¥ah{i,L) is given by 

27/(27+1) 



<p £ := [eVlnln(l/e)] 

In this section, we show that the application of our general selection rule with a specific 
choice of the kernel collection JCah C 1Ch,£ leads to the minimax estimator on Fah"(7, L). 
Define the set of bandwidths Hy C H 



Uy := I h e [h min , /i max ] d : JJ h? = <p e j 



(40) 



and consider the following subset of the family of kernels ICu,£- 

JC AH = {G KE ■ {h, E) e Oah ~ Uy x {I d }}, (41) 

where Id is the d x d identity matrix. 

The corresponding family of estimators is given by 

FQCah) = = j G h {t - x)Y(dt),h € W 7 |. 

For all /i £ Tty , we have 

d 

*h = \\Gha h = ||G||i||G|| 2 n^" 1/2 = ||G||i||G|| 2 [eVlnln(l/e)]- 1/(a7+1) . 

i=l 

Thus, the set E@ AH consists of the single point ||G||i \\G\\ 2 [es/haE(l/e) ]~ 1 /( 2 'y+ 1 ). 

Let e(cr) = Ci cryln ln(/i max / /imin) , where Ci is a numerical constant depending only 
on d and G. Lemma 3 of Section 6 shows that e(cr) is an upper bound on e.K, AH (cr). 
Note that assumption (E) is trivially fulfilled with c e = G e = 2 and the majorant in our 
procedure can be taken as follows: 



Q(a) = a[l + 4Ci v/lnln(/i max //i min )]. (42) 
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Let Fah be the estimator derived from the collection T{1Cah)i in accordance with our 
general selection rule, with the majorant (42). 

Theorem 3. Fix < a max < oo. Let h m i n — e 2 , h max = 1/2 and assume that (32) holds 
with I > [amaxj • Then, for any a £ A-, fl (0, a max ] d , 



( I T~\ : 
sup K r [F AH ;F] < Ci 1 /(27+i) £ / mm _ 

■M d (a,L) V V ej 



27/(27+1) 

ip T^fi^F] < CL 1 /^ 1 ' ( e A /lnln- ) 

FeH d 

where C depends only on G, d, r and 7. 



4.4. Global minimax estimation over isotropic Besov classes 

We begin with the definition of the isotropic Besov class of functions on T>q. 
For all x £ T> and a e R d such x + aGV, define 

AlF(x) = F(x + a) - F(x). 

For any integer I > 2, let A l a F(x) denote the (I — l)-fold iteration of the operator A\F(x). 



Definition 3. Let s > 0,p€ [1, 00) and L > be given constants. Let B* ^(d, L) denote 
the set of all functions satisfying 



sup|a|ri|AW +2 F(.)|| p <L, 



where [s\ is the largest integer strictly less than s. We callW poo {d,L) the isotropic Besov 
class of functions. 

The considered classes were first introduced in approximation theory by Nikolskii 
(1975). They represent a particular case of the Besov classes M pq {d,L) with q = 00 
which appear more often in the statistical literature. More general anisotropic Besov 
functional classes were considered in Kerkyacharian et al. (2001). 

On the class B* ^(dyL), we introduce the maximal risk 

TZ hr (F)= sup {E F \\F-F\\:y/ r , re [l,oo), 
where F is an estimator of F. It is well known (Delyon and Juditsky (1996)) that 

e s/(s+d/2)^ if S p> 



[e v ^al7e]" /( ' +,,/a) [bil/e] 1 / p ) ifsp = 



d{r- 


p) 


2 




d(r- 


p) 


2 




d{r- 


P) 


2 
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is the minimax rate of convergence on B* ^(d, L) if sp ^ d ( r -p) an d differs from minimax 

rate of convergence by ln(l/e)-factor if sp = d (' ~ p ) . 

In this section, we present the estimator which attains the rate ip £ on ^(d, L). As 
before, this estimator is the output of our general selection procedure. 

Let 

G(t) = G*(t) := E(-1) J,+1 ( W / 2 ) y d g(~) , teR d , 

where g : M. d -> M is a bounded, compactly supported function with J g = 1. ft is easily 
seen that the function G* satisfies assumption (32). 
Consider the following subset of JCn.s'- 

Kb = {Gl E : (h, E) e Q B ~ H B X {I d }}, 

where H D Hb := {h= (hi, . . . , hd) € H:hj = hj,i,j = 1, d}. Note that the family TCb 
consists of isotropic kernels having the same bandwidth in each direction. The corre- 
sponding family of estimators is given by 

F(K B ) = ^F h :F h (x) = J G* h (t — x)Y(dt), h G TL B 

Let Fb be the estimator derived from the collection T(JCb) in accordance with our 
general selection rule, where the majorant Q is given by 




Q(a) = C (s, g)a^J 1 + xi ln(er/ cr min ) =: C 1 aQ*(cr/a min ). 
Here, Q*(z) — z\J\ + lnz, z > 1 and Ci — C(s, g, d, r) is the numerical constant. 
Theorem 4. Suppose that s > d/p and choose h m m = e 2 md 

1/2, ,/ S p<fc^. 
Then, for all e > small enough, 

n u (F B )<C(d,s,p,r,g)cp r e , 
where C(d, s,p,r, g) > is a numerical constant. 

Remark 1 1 . The result described in Theorem 4 was first obtained by Delyon and Juditsky 
(1996) using wavelet techniques. Lepski et al. (1997) used the pointwise approach in or- 
der to develop minimax theory on the Besov balls. All results in Lepski et al. (1997) were 
obtained for the one-dimensional case d = 1 and the selection rule proposed there, being 
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a modification of Lepski's method, cannot be directly extended to dimensions greater 
than one. Generalization to an arbitrary dimension was proposed in Kerkyacharian et al. 
(2001). This allowed minimax results to be developed for the anisotropic Besov-type func- 
tional classes. The class studied in this section can be viewed as a particular case of the 
anisotropic one and in Theorem 4, we reproduce the results from Lepski et al. (1997). 

4.5. Mixture of problems 

Consider the family of estimators 

T{Km,e) = \f k>e : F h>E (x) = J G* h)E {t, x)Y(dt), heH,Ee£^ 

and let F be the estimator derived from the collection T^IC-h^) in accordance with our 
general selection rule, where the majorant Q is given by 



Q{a)=C 2 (J^l + \n(l/e). 
Here, Ci = C2{d,r, g) is a numerical constant. 

Theorem 5. Choose h nim = e 2 , h max = 1/2 and suppose that G satisfies assumption 
(32). Then, for all e > small enough, 

1. under the conditions of Theorem 2 the estimator F is minimax, that is, it satisfies 
(39); 

2. under the conditions of Theorem 3, we have 

2 7 /(2 7 +l) 



sup 

F£M d (a,L) 



n r [F;F]<CL 1 ^ 2 '> + ^(eJln-) 



where C depends only on g, d, r and 7; 
3. under the conditions of Theorem 4, we have 



n^iP) <c{ 



\e^Tfe] sns+d/2 \ ifsp> 

[ £v /h7l7ir /(s+d/2 W/£] 1/r , *fsp = 

[eV^r d{1/p ~ 1/r)/is ^ 1/p - 1/2) \ l fsp< 



d(r~ 


P) 


2 




d{r- 


P) 


2 




d{r- 


P) 



where C depends only on g, d, r and 7. 

The proof of the theorem is along the same lines as the proofs of Theorems 2-4 and is 
hence omitted. 



Remark 12. 
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1. Comparing the results from Theorems 3 and 5, we conclude that the rate provided 
by the estimator F differs from the minimax rate of convergence on Fah by a 
[In (1/e)/ lnln(l/e)] 2 T/( 2 ''+ 1 )-factor. 

2. Comparing the results from Theorems 4 and 5, we conclude that the estimator F 
is minimax adaptive up to a yTn (l/e)-factor for all values of parameters s and 
p. Moreover, F is a minimax adaptive estimator on the isotropic Besov balls of 
functions for all s and p such that sp < d ( r ~ p ) . A wavelet thresholding estimator 
which is nearly minimax adaptive over a scale of one-dimensional Besov balls was 
developed in Donoho et al. (1995). 

5. Proof of Theorem 1 

In the proof below, c, Ci,C2, . . . denote constants depending only on r, c e and C e ; they 
can be different on different occasions. 

0°. We begin the proof by showing that under the premise of the theorem, the selection 
rule (22)-(24) is well defined, that is, there exists a measurable choice of fj, £ such that 
(23) is fulfilled. 

It follows from Lemma 1 and assumptions (K2) and (K0) that there exists a separable 
modification of the Gaussian random process {^^(x) — £ v (x), (fj,, v) £ x 0} that with 
probability one belongs to the 2m-dimensional isotropic Holder space with regularity 
index < t < 7 (see Lifshits (1995), Section 15). In addition, if (K2) holds and F is 
uniformly bounded, then the integral f K„(y, x)B fl (y)dy, considered as a function of 
(fj,, v), belongs to the 2m-dimensional Holder space with regularity index 7. Then, by 
(14) and (13), we obtain that \F^ u (x) — F v (x)\ is continuous in It also follows 

from (26) and (27) that a v (x) (and a v (x)) are continuous functions of v £ 0. Hence, 
Q(a u ) is also continuous in is; thus, the random function under the supremum on the 
RHS of (22) is continuous in (fx, v). is then a random variable for every (i £ 0. 

We now describe the construction of the measurable choice fj,£ satisfying (23). Let 

Rw = \F^ v -F v \-\eQ{5 v ), /j,,v£Q. 

For any 5 > 0, there exists a simple function, say i? Mi „, on x such that \R^, V — Rn,v\ < 
S for all fj,, v £ 0. Then, clearly, 

l^-jg^a v^e, (43) 

where we have defined R^ := sn\> v . a ^ >d ^ R^.y We now observe that i? M is a simple func- 
tion of fi £ and define /t = arginf Me e{-R/j + zQi&fi)}- Since the function R^ assumes a 
finite number of values and Q((T At ) is continuous in /x, fi is measurable and belongs to 0. 
(43) then implies (23) if 6 is chosen to be \eQ(a m i n ). 
1°. We write 



{E F \F - F\ r y/ r < {E F \F - F| r l(a A < er M . )Y' r + {E F \F - F\ r l(&^ > ^)} 1/r (44) 
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and our current goal is to bound the two terms on the right-hand side. 
2°. By the triangle inequality, 

\h - F|l(a A < <v ) < [|F A ,„, -F fi \ + \F^, -F fl .\ + |f> - < er M .) 

=: [Ji + J 2 + J 3 ]l(a A <^«)- 

By the definitions of -F^.i,, F^ and 

Ji < < [|B A)/4 . - S A | + e|£ Ai „. - £ A |]l(a A < M 
< suplB^. - B v | +e sup 

+ e sup — 

Therefore, in view of Lemma A.l in the Appendix, 



{EM r l(^<^*)} 1/r <^.+ £ {l sup -i v A 



1/r 



<5 )1 .+aE{e(v) + 2v} (45) 

where we have used the definitions of /x* and Q(-). 
Furthermore, 

J 2 1(^a {\F^ -f„,\- i e g(^,)}i(^ <«V) + 2^(<V) 

<i? A + ±eQ(^.) 

<V + |eQ(^.) + ^ 

where the second inequality follows from (22) and the third is a consequence of (23). 
Hence, 



{E F J r 2 l(cr A < <V )} 1/r < {E F iC > O)} 1 ^ + §eQ(cv ) + 



Because 



V = sup [|F M .,„ -F v \- \eQ{a v )] 



SUP -&| - |<5(^) 



(46) 



([•]+ = max{-,0}), 



we obtain 



{E F JJl(a A <^.)} 1/r 
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<B^ +e{E 



sup \£n*, v -£ v \-hQ(o- v 



< Bp* + |eg(5- M , ) + 5 + cea n 

< ceQ(o- M »), 



(47) 



where the second inequality follows from Lemma A. 2 (see Appendix) and the last in- 
equality follows from the definitions of /i* and Q(-). 
The bound on {Ep J^l(ap, < o^')} 1 ^ is immediate: 

{E F J 3 r l(a A <a^)} 1/r <S M . +e[E|^.|1 1/r < A*' +ce<r M . <ceQ(a^). (48) 

Combining (45), (47) and (48), we obtain that there exists a constant c depending only 
on r such that 



{E F |F A - F\ r l(a^ < <V)} 1/r < csQ(a^). 



(49) 



3°. To bound the second term on the right-hand side of (44), we proceed as follows. 
Define the events A k = {2 k ~ 1 a fl » < Bp, < 2 fc <r Al .}, k = 1,2, and let fi k <E Q F (K, e ) be 
such that the corresponding estimators F^ k G J-(JCq) have the following properties: 

(i) e£ fc =var{F Mfc } = 2 fc *,,.; 

(ii) B^<\eQ{a^ k ). 

The existence of estimators F llk satisfying (i) and (ii) is guaranteed by the fact that 
<3 f (JCq) is non-empty and /i* 6 <3p(JCq). We can then write 



oo 

=-J2^ Ilk + l2 - k + h,k]l(A k ). 



(50) 



k=l 



We have 



< 



o e< 9(^J+ £ Sup \£u,n h -Zu\ 



1(40, 



where the second inequality follows from the definition of fik- Hence, by the Cauchy- 
Schwarz inequality and Lemma A.l, 



{E F /[,l(A fc )} 



l/r 
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< ^eQ(a hlk )¥ 1 J r (A k ) + elEp sup - ^ \ [F F (A k )]V 2r 

(51) 

<\eQ(a, k )¥f\A k )+ce{e{a, k )+~a, k W F (A k )] 1/2r 
<ceQ(a, k )[P F (A k )]^ 2r . 

Furthermore, 

I 2 ,kl(A k ) < [\F fiifik — F llk | — \sQ{a^))l{A k ) + | e Q(er w )l(A fc ) 

< [£ A + ± £ Q(a M J]l(A fc ) 

< [A ( ,. + |eQ(?„J + 6]l(A fc ) 

< [Vl(fl„. >0) + |eQ(? M J + i]l(^*), 

where the second inequality follows from the definition of and the third from the 
definition of /t and the monotonicity of Q(-). Arguing as in (46) and (47), and using the 
Cauchy-Schwarz inequality, we obtain 

{E F Il k l(A k )}^ r 

< {EpBfclfa. > 0)} 1 / 2r [¥ F (A k )] 1 / 2r + (feQ^J + S)[P F (A k )]^ r 

< + C a7 min )[P F (A fc )] 1/2r + (|eQ(5 M J + <5)[P F (A fc )] 1/r 
KceQia^mr^Ak)} 1 ^. 



(52) 



Finally, 

{E F Il k l{A k )Y/ r < B^ F {A k tl r +e{E\^ k ry/ 2r [V F (A k )} 1 ^ 

< IsQ^lF^A^ + ce^MAk)] 1 ^ (53) 
<cEQ{a, k W F {A k )Y' 2 \ 

where we have used the definition of fi k . Combining (51), (52) and (53), we obtain 

{EWT ifc l(40} 1/p + {E F % tk l(A k )}V r + {E F r 3tk l(A k )}V r < ceQ{a, k )W F {A k )Y^. (54) 

In order to complete the proof, we need to bound F F (A k ) from above. 

4°. Note that for any integer 1 < m < k, by definition of /t, we have A k C {ap, > o' jUji _ TO }. 
Hence, 

A k C >CT Mfc _ m } 

C + eg(CT A ) < Rft k _ m + eQ{a flk _J + 6} 

(55) 
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C {iW m + £Q(^ k - m ) > ieQ(^A) - 6) 
C {i?, Ifc _ m + eQ(* m *-J > ieQ(cr Mfc _J -5}, 

where the second inclusion is by (25) and the third is by the monotonicity of Q(-). 
Furthermore, using assumption (E), we have 

o<3(Vi) -<2(5> fe _J 



= o x O e (*/*fc-i) + o^fc-i V 1 + Xl ln "^"^ ~ x oe(CT Mfc _ m ) - _ to 4/1 + xi In ^ 

^ ^ \ CT m in V (7 m j 



> 



rC™" 1 - 1 



x e(cr /J 



1 + ln- 



> 2 X oe(CT (tlfe _ m ) + iCT Mfc _ m Jl + xiln^ 



:<2(5> fe _ m ), 



provided that m > 3 V [1 + (ln3/lnc e )]. Choosing 

m = m := \1 + (ln3/ lnc e )] V 3, 

we obtain that 

1 



< 



<4 



£ Sup 

v:au>Bu.,_ 



sup 

<>5> fc -m 
xi /64 



(56) 



>-sQ(a^ ma )} (57) 



>0 



where the first inequality is by (55), the second is by the bound on i? M (see (46)), the 
definition of S and the monotonicity of Q(-), the third is in view of the definition of the 
/Life's and the fourth inequality follows from Lemma A. 2. 

5°. Now using (54) and (57), we bound {E F |F A - F^l^ > a^)} 1 ^; see (50). 
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Let mo be given by (56) and, for the sake of brevity, set 7 = xi/64. Then, 



°° /_ \7/(2r) 00 

J2 ^F{A k )] 1/2r <ci 2 ro °T/( 2r ) ^ 2 

fe=m + l V / fc=m + l 

Moreover, using assumption (E), we obtain 

00 

fc-mo + 1 



-fc 7 /(2r) < ( 



7 /(2r) 



2 m 07/(2r) 



fc=mo + l 



^o e (^J + ^\/ 1 + ^i ln 



7/(2r) 



< Cixq2 



m j/(2r) ( "mm 



7 /(2r) 



fc-mo + 1 



+ C22m0 7/(2r)~ 



< c 2«io7/(2?') 



7 /(2r) 
7 /(2r) 



7 /(2r) 00 



(58) 



fc^mo + 1 



V- 2 fc- fe7 /(2r) l 1 + Xlin ^ZE 
" cr min 



xoe^. ) + cr M . a / 1 + xi In — 



because, by our choice of xi, 7 = xi/64 > 2r(lnC e /ln2), which implies that the sums on 
the right-hand side are finite. In addition, 



£<?(^J=£ 



fc=i 



fc=i 



x e(o- M J + CT Mfc Wl + xiln 



fc=i 



< xbe(er„. ) £ C e fe + er„. Jl + xi In £ 2 fc + 2cr M . ^^i hi 2 ^ 2 fc Vfc (59) 



fc=i 



fe=i 



here, we have used assumption (E). 

Therefore, combining (54), (50), (58) and (59), we finally obtain 

{E F |F A -^ri(a A >a^)} 1/r 

mo 00 
fc— 1 fc-mo + 1 

This inequality and (49) lead to the statement of the theorem. 
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6. Proofs of Theorems 2, 3 and 4 

The proofs of Theorems 2, 3 and 4 use upper bounds on the function ejc e (") defined in 
(19). Therefore, we begin this section with two lemmas establishing such bounds. We 
then present the proofs of Theorems 2, 3 and 4. 

6.1. Bounds on function e^; e (-) 

For fixed x G T>o, consider the random process {77^,1/ (x), p, v G 0} given by 

Vi*A x ) = &A X ) = J[K li , v (t,x)-Ku(t,x)]W(dt), p,veQ. 

For A, A' G 6, define 

p{\,\') = \\K X -K X ,\\ 2 , 00 , K X (;X)=K X (;X)/\\K X (;X)\\ 2 ; 

p(A, A') = su P |l - \\K x (-,x)\\ 2 /\\K x ,(-,x)\\ 2 \. 

x 

The next lemma establishes an upper bound on the intrinsic semi-metric of the process 
Lemma 1. Let 

p[(p, v), (//, v')\ := ^Je\t^^(x) - n^y{x)\ 2 . 

(i) Then, for all p,, v, p' , V 1 G 0, we /iawe 

v), {pi, v')\ < 2a v (x) [p(v, v') + p(v, v')] + (x) \p(p, p') + p(p, p!)} . (60) 

(ii) In addition, suppose that = ®j=i ©j ^ — (^lj • • • i®l)> where 0j G 0j, j = 
1,...,L GivenOj G i=li ^- 0j andXjeQj, write Kg. x = K$ u ... ) e j _ 1 ,\ j ,e J+u ...,$ l ■ Then, 

1 

p(A,A')< £ sup||AV- a- ,vlk=o VA,A'e0. (61) 

Proof, (i) We have 

pK/U^Mm'V)] = \\{K v ^(-,x) -K v (-,x)) - (K v ,^{-,x) - K v ,{-,x))\\ 2 

< \\K Utli (-,x)-K v , tll >{-,x)\\2 + \\K V (;X) -K„,(-,x)\\ 2 

< \\K„ !fl (-,x) - K u ,^{-,x)\\ 2 + \\K,y^{-,x) - K v >y(-,x)\\2 
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+ \\K v {-,x)-K v ,{;x)\\ 2 

= \\K V ,^,X) - K v , >fl {-, X )\\ 2 + \\K„y(; X ) ~ K^y {; x) \\ 2 
+ \\K v (;x)-K u ,(;x)\\ 2 , 

where the last line follows from assumption (Kl). 

Thus, to prove (60), it suffices to show that for all A, A' £ 0, 

sup\\K x . e (;x)- Ky t e(;x)\\2<a x (x)(p(X,X')+p(X,X')). 
eee 

Let us prove (62). Indeed, using the Minkowski inequality, we get, for all 8 £ 0, 
\\Kx,o(;x)-Kx,,e(;x)\\ 2 = J [( I K e {y,x)[K x {t,y) - K x ,(t,y)}dy) dt 



(62) 



< J \Ke(y,x)\\\K x (;y)-K x ,(.,y)\\ 2 dy. 

Moreover, for all y, 

\\K x (-,y) - Ky(;y)\\ 2 < \\K x (;y)\\ 2 \\K x (;y) - K x ,(.,y)\\ 2 + \\\K X (;y)\\ 2 - \\K x ,(.,y)\\ 2 \ 

<\\K x (;y)\\ 2 (p(X,X') + p(X,X r ))- 
It remains to note that by definition, 



□ 



ov(x)=sup / \K e {y,x)\\\K v (-,y)\\ 2 dy V a v {x). 
eee J 

(ii) The statement follows immediately from the triangle inequality 



Using general results of Lemma 1, we now establish an upper bound on the intrinsic 
semi- metric of the Gaussian process r]u tV with index set = TL x £ . 

Lemma 2. Let IChx be the family of kernels defined in (34). Then, for all p,v,v' £ 
Ti. x E, we have 

d 



p[(fi,u),(fi,u'))<2a v (x)\M[Y^ 



d , 2\ 1/2 

hi 

1 

h 



T K 
1 Ut 



i = l 



Mdh-l\E-E'\ 2 



Proof. Let v = (h, E), v' = (h' , E 1 ) £H x £. Our current goal is to bound p(y, v') from 
above. For this purpose, we apply Lemma 1 with = TL x £. 
In view of (33), (34) and Remark 7, we have 

\\ a h,E^h,E - o-^, E Gh',Eh 
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\\G\ 



2 1 ' 



n ,1/2 



G 



u=i 



hi' 1 h d 



(Hm^) g (I 



IIGUa 1 ' 



£i id 

'•••'77 

2 >. 1/2 

di I 



1-1 ' ■ • ■ ' 77 *d 



^d 

'(/ 



1 "d 
2 \ 1/2 



<\\G\\Aj G{h,...,t d )-G(^-t 



+HGH2 1 l-nvw 



1/2 



. i=l 



< ik 



. 1=1 



1 - 



1/2 



1/2 d 



i=l 



i=l 



i=l 



here, we have taken into account (32) and the fact that ||G||2 > 1. 
Furthermore, if H = diag(/ii, . . . , /i^}, then 

\\ a h,EGh,E — &h,E'Gh,E'\\2 = ^bII^^B — Gh^B'jb 

= l|G|| 2 - i n^ /2 { / |G h (^t)-G fc ((^) r t)| 2 d* 

i=l ^ 

= HG^ 1 \G{H- l E T t) - G{H-\E') T t)\ 2 dt^ 
Combining (63), (64) and using (61), we obtain 



1/2 



1/2 



7^y)<Mj^ 
U=i 


hj 

1 - 

K 


2 . 1/2 

) + 


d 

i=i 




+ Mdh-l\E-E'\ 2 


Observe, also, that 














p(u,v 


0=11- 


- (Th,E/<Th<,E'\ = 


d 

»=i 


\fK/hi 
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□ 



Lemma 3. There exist constants C\, C2 and C3 depending on G and d such that the 
following statements hold: 

(i) if K-si is the family of kernels defined in (35) and a > 1, then 

e/Cs/ ( a ) < CioVlncr Vcr £ Ee S / i 

(ii) if K.ah *s fie family of kernels defined in (41) and hi(h max /h m i n ) > 1, then 



e K, A „{°) < C 2 cr[ Vlnln(/i max /'imin) + 1] Vcr £ Ee^,,; 
(iii) if JCb is the family of kernels defined in Section 4-4 an d a > 1> then 



&k b ( (T ) < Cj,a^J\a{l + ln(cr/cr min )). 

Proof. Throughout the proof, c, Ci,C2, . . . denote positive constants depending only on 
G and d. They can be differ from appearance to appearance. 

1°. Let ICsi be the family of kernels defined in (35). Let cr m i n and cr max be as defined in 
(37) and fix a £ [c m i n , cr max ] . Here, v = (h, E) and the index set of the corresponding ran- 
dom process {^, v — £„} is given by {v : a v < a} = [h a , h max ] x £ , where h„ = ci/i'f+V -2 
(see (36)). 

Lemma 2 implies that the following upper bounds holds on the semi-metric psi of this 
process: 



Mdh~ l \E-E'\ 

for all v, v' such that <j v V cv < a. Note, also, that by (18), 

sup var(£ Mi „ - £„) < 2 sup u v = 2a. 











< 2ct |m 




+ 2 








hi 



(65) 



V.(T 1 ,<~.(7 



V. (T.,<~.<T 



The number of balls Ni(Q of radius £ in semi-metric C2cr{|l — hi/h[\ + |1 — h[/hi\} 
covering the set [/i,j,/i max ] = [c\h~^ 1 <7~ 2 ^ h max ] admits the following upper bound: 

Ni({) < ln(c3<r 2 /i^ lax ) In 1 (1 + C4<^(7 1 ). 

The number of balls A^(C) of radius C m the semi-metric c$a\E — E'\i covering £ does 
not exceed (c6cr/i~ 1 £ _1 ) d_1 = (cjo- 3 h^xC^ 1 )^ 1 ■ Thus, the total number of balls covering 
[hp, /imax] x £ equals iVi(C)A^2(C)- Hence, using the bounds on Ni(Q and ^(C), (65) and 
the bound on the supremum of a Gaussian process in terms of the Dudley integral (see, 
e.g., Lifshits (1995), Section 14), we conclude that 



[vflniVi(C) + vflnJVaCC)] dC < cct\/W. 
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The first statement of the lemma is proved. 

2°. For the family of kernels K-ah of Section 4.3, we have v = h £ 7i 7 , where 7i 7 is 
defined in (40). Note that the set ^s A h consists of the single point 

It follows from Lemma 2 that the semi-metric pah of this process admits the following 
upper bound: 

l 2\ Va 



p A H[(^u),( f i,v , )}<2a*M[^2 



v.»=l 



1 - 



The number of balls N(() of radius C m the above semi- metric covering the index set 
Hj does not exceed 

Hence, applying Lemma A. 4 (see Appendix), we obtain 



VlnA(C)dC < c 2< 7*[ A /lnln(/ lmax //i min ) + 1]. 

3°. For family of kernels Kb, we have v = h\ £ [h m - ln , h max ] and 
a hl = \\G*\\ l \\G*hh- d/ \ v^ = \\G*UG*hh^L\ a max = \\G*\\ 1 \\G*\\ 2 h^ 2 . 



According to Lemma 2, 

p B [(H,v),(ji,i/)]<2aiMd 



1 — — 

h> 



for all v = h\, v' — h! x such that cr^ V 07,^ < a. 

For fixed a £ [cr min , cr max ] , we set h a = (||G*||i||G*|| 2 cr _1 ) d/2 . The number of balls N(() 
of radius £ in the semi- metric ps covering the set [h a , h max ] does not exceed 



Hence, 



N(0<ln(c 1 h max a d / 2 )lii- 1 (l + c 2 [Ca~ 1 } 1 / d ). 



J Vln N(() dC < a7^1n(l + m(a/<7 min )). 



□ 



6.2. Proof of Theorem 2 



Throughout the proof, c, c\, c 2 , ■ . . stand for constants depending only on d, G and r. 
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We show that Qf(I^si) is non-empty for any F G ¥si{a,L). Assume that F(t) = 
f(uj$t), w e§ d ~\ where / G Hi (a, L). 

First, we note that in view of (36) and (38), there exist constants c\, c 2 such that 

dJ ^-ln^-<Q(a h , E ) <c 2 ^j ^]n^- V(h,E) G Hi x £. (66) 

Consider the family of kernels K? SI :— {Gh.e ■ h G Hi,E = Eq} C ICsi, where E$ is a fixed 
orthogonal matrix whose first column is ljq. Clearly, for any estimator associated with 
kernel Gh,E from iCgj, we have the following bound on the bias: \Bh.E ( x ) \ < Lh" for 
all x. Moreover, by (17) and the fact that M(K.si) = ||G||i, we obtain 

B hjEo {x)<\\G\\ lS vv\B KEo {y)\ < \\G\\iLh" V/iGW x . 

v 

Let h* = (h*,h max , . . . , h max ) be defined by the balance equation 

\\G\\iL{h\) a = \£Q{d h ,, Eo ). 

It then follows from (66) that 

hl=c 3 [(e/L)VH^/L)} 2/i2a+1) - (67) 

Note that for e small enough, h\ G [/i m in, and by definition of h\, Bh*.E < 

\eQ{a h ', Eo )- 

We now show that (h*,Eo) G Qf(£si)- To that end, fix a G \&h*,E , c max ]. Consider 
the estimator associated with parameter (h',Eo) such that ov.Eo = G - Hence, by (36), 
h'i = c(T~ 2 < ccr^» E = h\ so that, in view of the monotonicity of the function Q( ), 

B h >, Eo < \\G\\ih{h'X < ||G||iL(ftI) Q = \sQ{d h *, Eo ) < ±sQ(a h ',E ). 
This shows that (h*,Eo) G Qf(I^si)- Then, applying Theorem 1, we obtain 

{E F \F SI (x) - F(x)\ r y/ r < ceQ(a h , iEo ). 
Substitution of (67) completes the proof. 



6.3. Proof of Theorem 3 

If F G Wah (7, L), then there exists a* = (a*, . . . , a* d ) G A 1 such that F G M.d(a* , L) . Note 
that under the premise of the theorem, we have, for any h G 7i 7 , that 

eQ(a h ) = as ( J] h~ 1/2 ) v /lnln(/ lmax //i min ) = c 2 [ ev /lnln(l/ £ )] 27/(27+1) = c m . 
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Define h* = (h*, ■ ■ ■ , h* d ) by the following relation: 

d\\G\\ x L(h*)< = \eQ(<x h *) = ^y e Vi = l,...,A (68) 

For e small enough, h* € [h m i n , h max ] d and, clearly, h* € W 7 . Let f/j, be the estimator 
from T{Kah) associated with kernel Gh* (see (33)). We have the following upper bound 
on the bias of this estimator: sup x \B h * (x)\ <L J2i=i (K) ai ■ Moreover, by (17), 

d 

B h *(x) < \\G Q \\iisap\B h .(x)\ < ||G||iL V(/i*) Q ** <d||G||ii(^) a I = ±sQ(a h *), (69) 

X . 

2—1 

where the last inequality on the right-hand side follows from (68). Because the set Ee is 
the singleton {ah*}, inequality (69) implies that Qf(I^ah) is non-empty. Application of 
Theorem 1 yields 

{E F \F AH (x) - F{x)\ r Y' r < c 3 eQ(a h .) = c 4 <p e . 
The theorem is thus proved. 

6.4. Proof of Theorem 4 

Before turning to the proof of the theorem, let us make some remarks which will be used 
in the subsequent proof. 

1. Let s[q] = [s — d/p + d/q] As,g€ [l,oo]. Then, due to the inclusion theorem for Besov 
balls (Nikolskii (1975)), we have 

M; t00 (d,L)CM s q %(d,L). (70) 

In particular, 

B; iOO (d,L)CB^(d,L)cC(2J ). (71) 

The last inclusion follows from the assumption of the theorem that s — d/p> 0. It also 
implies s[q] > 0. 

2. Let us introduce the following notation. For any f) G (0, h max ], let /i(h) = (f), . . . , fj) £ 
H.B and define 

Bf,(-):= JcT m (t--)F(t)dt-F(-)=:B li (x), H = h(f>), 
B t)tf) ,(.):=B tl , v (-)-B v (x)=jG* h(f]l) (y--)B t ,(y)dy, v = h$), (72) 

§„{•):= sup |B M ,(.)-B„K-)|V|Bij(-)l- 

f)'e(o,fc ma J 
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Let K.b(x),x £ T>o, be the cube centered at x with side length equal to b. The possible 
values of b are found from the condition x + b£V for all x £ T>o and, later, sup b denotes 
the supremum over this set. 

Assuming, without loss of generality, that the support of the function g belongs 
to [-l/2(|_sj +2),l/2(|_sJ + 2)} d (it also implies that the support of G* belongs to 
[-1/2, 1 /2] d ), we obtain from (72) that for all x £ T)q and \) £ (0, h max ] , 

<Ci (*,<?) sup 1 / \B^y)\dy=:C 1 {s,g)B^\x), (73) 

b JK b (x) 

where Ci(s,g) is a constant depending only on ||<?||oo and s. To obtain (73), we used 
the fact that B^{x) is a continuous (even uniformly continuous) function of f), since F is 
uniformly continuous on T>q and G* is bounded. The uniform continuity of F follows from 
(71) and the compactness of Dq. Note that i?^ max '(-) is the Hardy-Littlewood maximal 
function of £?(,(•) (see, e.g., Wheeden and Zygmund (1977), Chapter 9, Section 3). 
3. The operator A l a has the following representation: 

A l a F(x)=Y,{ l j)(- i y +lF ( x +^ W^ 1 ' Va>0 - 



Therefore, 

Sl+IAI 



(-iy^KF(x) 



^(^(-ly+'Fix + ja) 

.7 = 1 



F(x). 



Using this formula and the definition of the function G* , we obtain, for any f) £ (0, h n 
Bf,(x) = J G*(u){F(x + ut))-F(x)}du 

E ( [SJ / 2 ) J d Jg(u/j){F(x + ut))-F(x)}du 



3=1 

, rW+2 

J E 



(_l)W+3 g( v )A[ ! s i j +2 F(x)dv. 



W+2 
J 



F(ar + ujf)) - ^ du 



Therefore, 



\B h (x)\<C 2 (g) \A^ +2 F{x)\dv=:C 2 {g)B^x) VxeD„, (74) 

•/[-l/2,l/2] d 

where C 2 (g) is a constant depending only on ||.g||oo- Here, we used the fact that the 
support of g belongs to [— 1/2, l/2] d . 
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Finally, from (73) and (74), we get 

Bf,(x) < C 3 (s,g)B { ™ x \x) Vrj G (0,h max ], Vx G V , (75) 

where, as before, Bjj m (•) is the Hardy-Littlewood maximal function of B^{-) and 
C 3 (s,g) = C 1 (s,g)C 2 (g). 

The next important property of ;£?i m (•) follows from the definition: 

sup sj max) (x) < 2 d B^ (x) Vr G (0, /i ma J , Vx G X> . (76) 

He[r/2,r] 

Indeed, for any f) € [r/2,r], we have 

6„(-):=r d / |AW+ 2 F(.)|du<2 d r- d / |AW+ 2 F(-)|du 

= 2 d B T (-). 

4. Since HGj^^lb = |jG*||2f) _ti / 2 =: cr^, the majorant can be rewritten in the form 

Q(W :=QK) ^C^Q^Vax/f)), 

where Q^O) = z d/ Vl + dmz, z>l. 

Thus, for our particular problem, the set Qf{I^b) is 

Q F (K B ) = QUKb) = {*) e (0, Vax] : By (a) < s^W). W < t)}, xe V . 
Note that (75) and (71) imply that for all F G W p oo {d,L) and all f) G (0, /i max ], 

||S l J oo <C 3 ( S ,. 9 )||6; max) || 0O = G 3 ( S ,. 9 )||^|| O o<2' i C 3 ( S ,.g) sup IIA^+VlL 

■uG[-l,l] d 

<2 d C< 3 ( S ,. 9 )[f,Vdr d/p sup|a^^ 

a 

Therefore, there exists a constant c, depending only on s^g^p^d and L, such that 

(0,e c ]ce^(/C B ) VF£M s PiOC (d,L), VxGX» . (77) 
Putting, for all F € Bp iOC (d, L) and all ieP , 

h i r(x)=sup{h:f)Gef(/CB)} ) 
we obtain from Theorem 1 that 

K u (F)<C r £ r sup f Q r (t, F (x))dx. 

F&& s P .. x (d,,L)JV 
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Proof of Theorem 4. As already mentioned, the function -Bf,(-) is continuous in f). 
Evidently, the function Q(i)) is also continuous. Therefore, in view of the definition of 
f)i?(x), we have 

B [)F(x) {x) = \eQ{\) F {x)) Vx€V :t) F (x)<h max . (78) 
Let k max £ N* be chosen in such a way that 2 _fcmax ft, max < h mm < 2 1 ~ fcmax /i max , where 

^min = & C • Set 

r = {x 6 T> : f) F (x) = h max }, 

T k = {i£D : 2~ k h max < i) F (x) < 2 1 ~ k h max }, k = l,/c max . 



Note that the sets (T^, k = 1, fc max ) form the partition of T>q since l) F (x) > h m i n for all 
x G T>o, in view of (77). Therefore, 

k k 

I(F):=e r f Q r (i) F (x))dx = e r f^f Q r (t) F (x))dx=:f^I k (F). (79) 

■' Vo k=0 k=0 

Let qk G (1, r], k G N* , be a sequence of real numbers, to be specified later. Then, in view 
of (78), we get Vfc = 1, fc max , 

I k {F)=e r -«*2«* ( Q r ~ qk (l) F (x))(B t]F{x - ) (x)) qk Ax. (80) 



It follows from (75) and (76) that 

B^ {x) (x)<2 d C 3 ( S ,g)B^ ] hni Jx) \/xeT k . (81) 

Moreover, 

Q(t>F(x)) < Q(2- fc /w) =Ch^2 kd ' 2 Vl + kd\n2 VxeT k . (82) 

Thus, we have, from (80), (81) and (82), that 

h(F) < Ci[e/i^ a r-»2 M fr-«)/ 2 fefr-«)/ a ||B^ TO ||*. (83) 

Here and later, we denote by G\,C<2, . . . , the constants depending on d, s,p,r, g, L, but 
independent of F and e. 

We have, for all f) G (0, h max ] and all F e B* (d,i), 



||< iax) ||^<C( % )||^||^=Cfe) 



|A,W+ 2 F|d w 

l/2,l/2] d 



< C(<fc) 



l|A];; J+2 F|| gfc d« 

[-1/2,1/2]" 



<//,■ 
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<C 2 



^snp\a\- slqk] \\A^F\\ qk 



(84) 



<3fcs[?fc] 



Let us comment on the proof of (84) . The first inequality follows from (Wheeden and Zygmund 
(1977), Theorem 9.16), where the constant C(qk) depends only on <j% and, moreover, 
sup 1<q<r C(q) < oo for any fixed r. The second inequality follows from the Minkowski 
inequality for integrals. The last inequality is a consequence of (70). 

Substituting \) = 2 1 ~ fc /i max in (84), we finally obtain, from (79) and (83), that for any 
F€W p (d,L), 



1(F) < C 4 



d/2 



+ £ (e^ 2 ) r " <?fc (Vax) <?fcStel 2- fcAk fc ( '-^ )/i 



fc=l 



Afc = qks[lk] - (r- qk)d/2, k=l,k 

Let us now consider three cases. 
Case 1. sp > d ( r ~ } 

£ 2/<>+d/2). Thereforej 



(85) 



Case 1. sp > d<<r 2 pS> . Choose q k = r A p for all k = l,fc max and recall that h n 



r -. . , d(r — r hp) 
s[q k ] = s, X k = X:=s(r/\p)-^ ^ > 0. 



Moreover, 



(fh, 



d/2 



Thus, we obtain from (85), for any F G ^(d, L) 



1(F) < C 4 



rf\p)/2 



k=l 



Case 2. sp = d ^ r 2 p ' . Choose q k = p for all k = 1, fc max and recall that /i ma x = 1/2. 
Therefore, 

s[qk]=s, A fc = 0. 

Taking into account that fc max ~ In ( 1/e) , we obtain from (85) that for any F€Bp M (d, L) , 

7(F) < C 6 [e r + e r - p [ln(l/e)] {r - p)/2+1 ] < C 7 <p r E . 
The last inequality follows from the relation r — p= ^I+d • 
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Case 3. sp < Recall that h max = 1/2 and ip E = [ £y /i^TJ^Y-d(i/p-i/r)/(s-d(i/ P -i/2)) _ 

Let f) e = [e-y/Tn 1/e] 2 , z = l/(s — d/p+ d/2), and define 

_(p, ifi<fc<r, 

9fc_ \r, iffc>fc*, 
where k* g W is chosen from the relation 2~( fc * +1 ) < f) £ < 2~ fe *. Noting that 



A fc = 



= /Ai:=ap-^ £i <0 ) ifl<fc<fc*, 
X 2 -={s-d/p + d/r)r, if fc > fc* + 1 



and again taking into account that fc max ~ In (1/e), we get, from (85), that for any 

FeB; iOC (d,L), 



/(F) < c, 



£ r + [eVlnT7;] r - p ^2- feAl + £ 2"^ 
fc=i fc=fe*+i 

< c 9 [[eyhTT7^] r " p 2- fe * Ai +2 -( fc * +1 )^] 



<C 9 [[eVlnV^] r ^e Al +t) e A2 ]. 

To obtain the second inequality, we used the fact that Ai < and A2 > 0. It remains to 
note that, in view of the definition of t) £ , 



□ 



Appendix 



Proof of Proposition 1. We have, for all /i^gG and all x € Vq, 



B^ v (x)= / K^{t,x)F(t)dt-F(x) = 



IV JV 



Ku.(t,y)K„(y,x)dy 



F{t) dt-F{x) 



K v {y,x) 



'Pi 



K^t,y)F(t)dt 



dy - F(x) 



K„(y,x) 



Z>i 



K„(t,y){F(t)-F(y)+F(y)}dt 



dy - F(x) 



K„{y,x)B fi (y)dy + / K v {y,x)F{y) dy - F{x) 

V x Jv x 



K v (y,x)B^y)dy + / K v {y, x)F{y) dy — F(x) 
Jv 

K v {y,x)B ll {y) dy + B v (x). 
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The fifth and sixth equalities follow from the second and first lines of (7), respectively. □ 
Lemma A.l. For any ji£ and r > 0, we have 



l/r 

E sup \Zw-&\ r } <C r {e K ;e(<5>) + 2 ( 5>}, 

v:a v <Ou. 



where 



8r / (tVlf-'exp^-y ] dt 



l/r 



r < 1, 
r > 1. 



Proof. For brevity in the proof below, we will write e(-) = ejt e (-). 

The statement for r < 1 follows immediately from the definition of the function e(-). 
If r > 1, then 

E sup \Un~£,A r = rl i r_1 p( sup |e„ l|t -&,|>*} d * 



<e r (^)+r/ t^I 



sup l^-^l >t^d* 



e r (^)+r [t + e(a,)Y 



sup -£„| -e(cr M ) >i f di 

y:cr„<o- u 



<e r (a^) + 2r [t + e{a„)] r - 1 exp<^ - 



2sup !/: ^<^cr2 J/ 



dt, 



where the last inequality follows from the fact that e(cr) > eo(cr) and Lemma A. 3 below; 
recall that var(^^ — £„) = cr^ j „. Inequality (18) implies that s\xp y . di <a ^ a^^ < 2ct m ; hence, 
continuing the preceding chain of inequalities, we obtain 



<e''(a AI ) + 2r^ 00 [i + e(a Al )r 1 exp|-^|dt 



e r (a M ) + 4ra M / {2icr M + e^)}^ 1 exp(-*72) dt 



<e r (a /1 )+4ra (U [2^ + e(^)] r - 1 / (tVl) r - l exp(-^/2)dt 

Jo 

/>OC 

< 8r[2^ + e(^)] r / (t V l) r_1 exp(-t 2 /2) dt. 



This completes the proof. 



□ 
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Lemma A. 2. Let assumption (E) hold and let the function Q be given by (21) with 
xq > 2C e and >t\ > 64. Then, for any ji €E and t > 0, 



sup 



1^-61 - T^QO^) 



>U <4 



3<i/64 



exp 



1652 



(86) 



Moreover, if k\ > 128r, £/ien 



E 



sup |£ M ,„ - \Q(cr v ) 



r n 1/r 



< C(T m i n , 



(87) 



where C is a constant depending only on r. 



Proof. As previously, we will write e(-) = e/c e (") 



Define N k = {v:2 k 1 a fl < & v < 2 k a fJ _} for k = 1, 2, . . . and write 



sup 



J fe=1 UeJv h 



(88) 



Since Q(o~) is monotone increasing in a, 



sup 



1^-^1-^0(5 



>t 



<P sup |^-6l>*+o<9(2 fc_1 5 M ) 



< 



sup |^,, - 61 - e(2%) > t + -Q(2 fc - i a Al ) - e(2%) 



sup - £„| - e(2 ft a (li ) > i + -x^* -1 ^) 



2 k - 2 & fl ill + Xi In- ^ -e(2 fc a 



2 fe -!CT 



/ 2 fe_1 <7 

<P<j sup |^, t ,-6|-e(2 fc ^)>t + 2 fe - 2 ^Wl + ^ 1 ln i 

li/:fr„<2 k a u V Cmin 



where the last inequality follows by assumption (E) and choice of the constant kq. By 
(18), 

sup varfo,,,, - £„) = sup a\ v < 2 2k+1 &l ■ 

y.a l/ <2 k a Li v\a v <.1 h a,. 
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hence, using Lemma A. 3, we obtain 



sup 



1 (t + a k ) 2 



where we have denoted for brevity 

a k = 2 k - 2 a^J 1 + xi ln(2 fc - 1 CT M /a min ), b k = 2 k+1 ' 2 d^. 



Noting that 



exp 



{t + a k ? 



< 



exp 



< exp 



2b t) 


exp 


r »2 


t 2 


k 




16a 2 





«i/64 



we have 



sup 



> 



1 1 < 2 1 -( fe - 1 )^/ 64 



exp 



16a 2 \ a 



xi/64 



(89) 



Summing up over k = 1, 2, . . . and taking into account (88), we arrive at (86). 
We now prove (87). Using (89), we have 



E 



sup - -Q(5v) 



<- 2 l-(fe-l)^i/64 



xi/64 



r / i r 1 exp <! — 



dt 



< c2 



-(fe-l)xi/64 f "mm 



xi /64 



This implies that 



sup ICamz-Ci'I- ~Q(&u) 



r \ 1/r f co 



< 



.fe=l 



sup - 



1/r 



< ccr„ 



xi/(64r) 



< C(T n 



E2- 

.fe=o 



1/r 



fcxri/64 



because of our choice of x\ . 



□ 



The next result can be found in, for example, Adler and Taylor (2007). 
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Lemma A. 3 (Borell, Tsirelson and Sudakov). Let X t , t GT, be a centered Gaussian 
process, a.s. bounded on T. Then, for all u > 0, 

svcpX t > EsvcpX t + u > < exp{-u 2 /2c4} 
teT teT ) 

and hence 

pjsup|X t | >Esup\X t \+u \ <2cxp{-w 2 /2c4}: 
where o\ = sup teT var(X t ). 

Lemma A. 4. Let a > and aa < cxp(l) — 1. Then, 



/V 1 n 1 n--(i + , i „ ) d,,<ggm^' i :'°;' (1+ ;' ) {i + — i — } 

Jo V a ln(l + aa) \ 21nl n - 1 (l + J 



The proof is immediate. 
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